Decision errors: False Positives

Elizabeth King
Kevin Middleton

This Week

  • Sampling from data sets: decision errors and predicting new data
    • Decision errors: False Positives
    • Decision errors: False Negatives
    • Prediction: Cross-validation

Decision errors

Reject H0 Fail to reject H0
H0 is true Type I error Correct
H0 is false Correct Type II error

False positive (Type I error):

  • You decide there is an effect when in reality there is not
    • P is small by random chance, given that \(\alpha\) is chosen ahead of the test

False negative (Type II error) probability depends on:

  • You decide there is no effect when in reality there is
    • Depends on the value of \(\alpha\) & how “wrong” H0 is
    • Random chance leads to the estimated effect being smaller than it is in reality

Uncertainty and Decisions

  • All decisions should be accompanied by estimates of your uncertainty in your decision
    • Intervals
    • Decision error rates
      • e.g., false positive rate, false discovery rate, false negative rate

Problems of Multiplicity

If you set a Type I error rate (\(\alpha\)) of 0.05 for any one test and then perform more than one such test on related data:

  • The overall Type I error rate for all your tests together (familywise) is greater than 0.05
  • You will be more likely than 5% to erroneously reject a true null hypothesis.
  • You will claim a significant effect when one does not exist.

Problems of Multiplicity

set.seed(3210)
nn <- 10
group1.mean <- 6
group2.mean <- 6
niter <- 1000
ps <- data.frame('p1' = numeric(length = niter),
                 'p2' = numeric(length = niter))

for(ii in 1:niter) {
  yy1 <- c(rnorm(nn, group1.mean, 1), rnorm(nn, group2.mean, 1))
  yy2 <- c(rnorm(nn, group1.mean, 1), rnorm(nn, group2.mean, 1))
  gg <- c(rep('a', nn), rep('b', nn))
  ps[ii, 1] <- summary(lm(yy1 ~ gg))$coefficients[2, 4]
  ps[ii, 2] <- summary(lm(yy2 ~ gg))$coefficients[2, 4]
}

Problems of Multiplicity

What is the probability of a false positive for yy1?

mean(ps[, 'p1'] < 0.05)
[1] 0.044

Problems of Multiplicity

What is the probability of a false positive for yy2?

mean(ps[, 'p2'] < 0.05)
[1] 0.053

Problems of Multiplicity

What is the probability of a false positive for yy1 or yy2?

sm.set <- ps[c(8, 12, 13), ]
sm.set$FP <- ifelse((sm.set[, 'p1'] < 0.05 | sm.set[, 'p2'] < 0.05), "Yes", "No")

length(which(ps[, 'p1'] < 0.05 | ps[, 'p2'] < 0.05)) / niter
[1] 0.095

The overall error rate = the family-wise error rate (FWER).

FWER vs. False discovery rate

Controlling FWER is appropriate when you want to guard against any false positives.

  • When might this be appropriate?

In many cases we can live with a certain number of false positives.

If so, the more relevant quantity to control is the false discovery rate (FDR).

False discovery rate

Proposed by Benjamini and Hochberg (1995).

  • Also see Curran-Everett (2000) for discusion

Controls FDR (i.e., rate of Type I errors), rather than FWER

\[\mbox{FDR} = \frac{\mbox{n False Positives}}{\mbox{n All Positives}}\]

e.g., I’m OK with 5% false positives among the tests I judge as significant.

Note: False Positive Rate = \(\frac{\mbox{n False Positives}}{\mbox{n All Tests}}\)

A menu of MCPs

  1. Do nothing
    • Not an option
  2. Methods to control the Family-Wise Error Rate (FWER):
    • MCs within a single linear model (e.g. Tukey, etc.; see QMLS1 08-2)
    • Bonferroni correction
      • Not recommended - overly conservative
    • Sequential Bonferroni procedure
    • Randomization procedures to empirically control FWER
  3. Methods to control the False Discovery Rate (FDR)
    • False Discovery Rate Methods
    • Positive False Discovery Rate Methods

FWER, FPR, and FDR can be estimated using Monte Carlo methods

Metabolomics in old and young killifish

Rows: 200
Columns: 10
$ Age <chr> "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "…
$ M1  <dbl> 4.135345, 4.266196, 4.937003, 4.652088, 3.425363, 4.932241, 4.7843…
$ M2  <dbl> 5.396550, 3.805441, 4.786611, 5.569076, 3.940508, 5.905152, 3.5324…
$ M3  <dbl> 3.794961, 3.961207, 5.728496, 6.156208, 6.322410, 4.837154, 3.4557…
$ M4  <dbl> 3.652372, 4.264235, 3.508127, 5.661753, 2.963517, 4.897702, 5.8498…
$ M5  <dbl> 4.368849, 6.529348, 6.431991, 5.454786, 4.498493, 5.481663, 4.3929…
$ M6  <dbl> 6.347962, 8.242852, 3.607855, 4.713970, 4.799455, 3.936193, 4.5323…
$ M7  <dbl> 5.105328, 5.887876, 4.824151, 4.951108, 5.684481, 1.991270, 6.8653…
$ M8  <dbl> 5.245491, 4.374724, 4.502716, 4.485565, 6.080971, 5.534854, 4.1364…
$ M9  <dbl> 5.444683, 6.884531, 5.950363, 4.195082, 5.104875, 5.251431, 4.5210…

Metabolomics in old and young killifish

getP <- function(fm) {
  sum.set <- summary(fm)
  p.set <- lapply(sum.set, function(x) x[['coefficients']][2, 4])
  return(unlist(p.set))
}

mods <- lm(as.matrix(metaM[ , 2:1001]) ~ metaM[ , 1])

obsPs <- getP(mods)

ggplot(tibble(obsPs), aes(obsPs)) +
  geom_histogram(fill = "grey75") +
  xlab("Observed P-values") +
  geom_vline(xintercept = 0.05, color = "firebrick4", linewidth = 2)

Metabolomics in old and young killifish

What is our empirical false postive rate?

Choose a decision threshold (e.g., p < 0.05)

d.th <- 0.05

mods.s <- lm(as.matrix(metaM[sample(1:nrow(metaM)), 2:1001]) ~ metaM[,1])
sampPs <- getP(mods.s)

ggplot(tibble(sampPs), aes(sampPs)) +
  geom_histogram(fill = "grey75") +
  xlab("P-values") +
  geom_vline(xintercept = d.th, color = "firebrick4", linewidth = 2) +
  annotate(geom = "text", x = 0.2, y = 50, label = paste0(sum(sampPs < d.th),
                                                          " False Positives"),
           color = "firebrick4", size = 7)

What is our empirical false postive rate?

What is our empirical false postive rate?

Repeat 1000 times

[1] 0.049837

Estimate the False Discovery Rate

\[\mbox{FDR} = \frac{\mbox{n False Positives}}{\mbox{n All Positives}}\]

FPs <- mean(ctPs)
APs <- sum(obsPs < d.th)

FPs/APs
[1] 0.3924173

Vary the decision threshold

ths <- c(seq(1e-16, 0.001, length.out = 60),
         seq(0.002,0.05, length.out = 40))
niter <- 1000
ctPs <- matrix(NA, niter, length(ths))

for (ii in 1:niter) {
  mods.s <- lm(as.matrix(metaM[sample(1:nrow(metaM)), 2:1001]) ~ metaM[ , 1])
  sampPs <- getP(mods.s)
  ctPs[ii,] <- sapply(ths, function(x) sum(sampPs < x))
}

saveRDS(ctPs, file = "ctPs2.Rds")

Vary the decision threshold

5% FDR Threshold ~ 0.003

Vary the decision threshold

Familywise Error Rate (FWER)

FWER is the probability that at least one test will reject a true null hypothesis, i.e., committing at least one type I error.

fp.out$FWs <- apply(ctPs, 2, function(x) sum(x > 0))

fp.out |>
  ggplot(aes(thresholds, FWs)) +
  geom_point() +
  geom_hline(yintercept = 50, color = "firebrick4", linewidth = 2) +
  labs(x = "Thresholds")

Familywise Error Rate (FWER)

Familywise Error Rate (FWER)

Comparing Counts

# A tibble: 11 × 5
   thresholds False   All      FDR   FWs
        <dbl> <dbl> <int>    <dbl> <int>
 1   1   e-16 0        27 0            0
 2   1.69e- 5 0.016    53 0.000302    15
 3   3.39e- 5 0.039    55 0.000709    38
 4   5.08e- 5 0.061    57 0.00107     57
 5   6.78e- 5 0.073    58 0.00126     69
 6   1   e- 3 0.975    62 0.0157     626
 7   2   e- 3 1.95     65 0.0300     858
 8   3.23e- 3 3.17     65 0.0488     967
 9   4.46e- 3 4.36     68 0.0641     988
10   5.69e- 3 5.58     70 0.0798     999
11   6.92e- 3 6.78     72 0.0941    1000

Sampling for FWER

set.seed(6383783)
niter <- 1000
minP <- rep(NA, niter)

for(ii in 1:niter){
  mods.s <- lm(as.matrix(metaM[sample(1:nrow(metaM)), 2:1001]) ~ metaM[ , 1])
  sampPs <- getP(mods.s)
  minP[ii] <- min(sampPs)
}

saveRDS(minP, file = "minP.Rds")

Sampling for FWER

Sampling for FWER

quantile(minP, 0.05)
          5% 
2.733538e-05 
quantile(l.minP, 0.95)
     95% 
4.563317 

When to use Monte Carlo for estimating false positive rates?

References

Benjamini, Y., and Y. Hochberg. 1995. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. J. R. Stat. Soc. Series B Stat. Methodol. 57:289–300.
Curran-Everett, D. 2000. Multiple Comparisons: Philosophies and Illustrations. Am. J. Physiol. Regul. Integr. Comp. Physiol. 279:R1–8.